C/C++ Users Group Library 1996 July

home *** CD-ROM | disk | FTP | other *** search

/ C/C++ Users Group Library 1996 July / C-C++ Users Group Library July 1996.iso / vol_200 / 200_01 / prog.man < prev next >

Wrap

Text File | 1980-01-01 | 149.3 KB | 4,789 lines

Introduction to SCI Programming 1 1. IIIInnnnttttrrrroooodddduuuuccccttttiiiioooonnnn ttttoooo SSSSCCCCIIII PPPPrrrrooooggggrrrraaaammmmmmmmiiiinnnngggg This section of the manual is a tutorial introduction to the C language. If you have a casual knowledge of BASIC and understand some of the fundamental concepts of programming, you should have no difficulty in following along. This tutorial is designed to be used along with SCI, so get out your working copy of the SCI distribution diskette. You did make a backup copy, didn't you? If not, DO NOT PASS GO, DO NOT COLLECT $200 until you've read and followed the instructions in the Introduction section of the SCI User's Manual! Now go ahead and start up SCI. The interpreter should be loading the default "shell" file, SHELL.SCI. This file simply contains a C program that is run by the SCI interpreter. It performs several functions (most of which shall remain invisible to you for the moment), but the most important is to allow you to write and test SCI programs immediately. After the interpreter has started up, you should see SCI's program identification banner and a greater-than symbol (>), like this: A> SCI Small C Interpreter, V1.5 20Oct86 Copyright (C) 1986 Bob Brodt SCI Shell V1.5 20Oct86 Copyright (C) 1986 Bob Brodt shell> The "shell>" tells you that SCI is now ready to accept input from you. One of the nicest features of SCI is its ability to immediately perform any C statement that you type. As you will learn later, every C statement produces a value as a side-effect. One of the functions of SHELL.SCI is to print this value as a decimal number after the statement has been executed. Thus, you could enter some arithmetic expression like the following: shell> 2+2; 4 shell> and have the SCI shell print the result, just like BASIC. SCI Programmers Manual Copyright (C) 1986, Bob Brodt 2 SCI Statement Structure 2. SSSSCCCCIIII SSSSttttaaaatttteeeemmmmeeeennnntttt SSSSttttrrrruuuuccccttttuuuurrrreeee In the above example, notice the semicolon at the end of the line. The C language allows you to write programs without regard to "white space" (spaces, tabs and ends of lines). This means that the components of program statements can be seperated by as many spaces or tabs as you like; program lines can be grouped together seperated from the rest of the program by blank lines, to show the reader that they perform a discrete function; you can indent groups of lines following a program looping statement to show where the loop starts and ends. By allowing you to "sculpture" your program like this, C lets you write very easy to read and understand programs. This is very much in contrast to BASIC which requires every program statement to start with a line number, followed by a space and then the statement all on a single line. Because C is such a free-form language it would have a difficult time recognizing the end of a statement without some kind of "end-of-statement" marker. This is the purpose of the semicolon. Now we're going to confuse you even further by telling you that SCI doesn't need a semicolon at the end of a statement! Because it's an interpreter, SCI recognizes either the end of a line or a semicolon as an end-of-statement marker. In fact, if a statement spills over onto another program line, SCI will complain - it requires that every statement be completely contained on one line. This restriction was imposed by the fact that SCI is an interpreter and not a compiler. This is an important difference between "SCI C" and "standard C" (which allows a single statement to be spread out over several lines). So if you are an experience "C hacker", please be aware of this fact. 3. SSSSCCCCIIII PPPPrrrrooooggggrrrraaaammmm SSSSttttrrrruuuuccccttttuuuurrrreeee When learning a new programming language, it's always helpful to recall fundamentals and ask yourself the question "what is a program?". Simply stated, a program is a list of instructions that tell the computer exactly what to do. A program written in the BASIC language is an ideal example of this concept; a list of instructions. The instructions are numbered to make it easy to see the order in which they'll be performed. Let's examine a fragment from a BASIC program and identify some of its key components. SCI Programmers Manual Copyright (C) 1986, Bob Brodt SCI Program Structure 3 100 REM *** sort a list of numbers in ascending order *** 110 DIM NUM(100),RSP$(80) 120 REM get the unsorted number list . . . 220 REM got 'em, now sort 'em then print 'em 230 GOSUB 500 240 FOR I=1 TO 100 250 PRINT NUM(I) 260 NEXT I 270 PRINT "Got another list to sort?"; 280 INPUT RSP$ 290 IF RSP$="Y" THEN GOTO 120 300 END 500 REM *** bubble sort routine *** 510 REM sorts the numbers in the array "NUM" . . . 600 RETURN Even the novice BASIC programmer can glance at this program fragment and tell what's happening: it starts with line 100, which is a note to the (human) reader telling him what the program intends to do - sort a bunch of numbers in ascending order. Line 110 tells the computer to reserve some memory storage we'll need later. Remember that BASIC allows variables to be "known" to every instruction in the program. Thus, you (the programmer) can not effectively control and limit access to variables. This makes it difficult at times to determine where in the program a variable is being set when it shouldn't be. This is a very important difference between BASIC and C, as you will find out later. The word "GOSUB" at line 230 tells the computer to hold its place at the current location in the program, then jump to instruction number 500. The "RETURN" at line 600 corresponds to the "GOSUB" and tells the computer to continue with the instruction following the "GOSUB". Notice that the set of instructions from line 500 to 600 are general-purpose in nature and could possibly be used in another BASIC program that required a number sorting function. However, to interface this sub-program to another program would probably require modifications to either the other program or the sub-program, or both. This makes the thought of extracting the number sorting function somewhat less attractive. SCI Programmers Manual Copyright (C) 1986, Bob Brodt 4 SCI Program Structure Now look at the instructions from line 270 to 290. Essentially, these ask the program user if there are any more numbers to sort, and jump back to the beginning of the program to start the process all over again. But, what if the programmer decides at some later time to modify the program and accidentally deletes line 120 - the target of the "GOTO" instruction at 290. BASIC would be totally confused, since it wouldn't be able to find line 120 anymore. Although numbering program instructions, like BASIC does, is very nice and neat and makes a program easy for the human reader to follow, it can become unmanageable as the program grows in complexity. Now let's take a look at the comparable program fragment written in C. Please don't be concerned with the details of this program at the moment, but rather focus on the overall structure: SCI Programmers Manual Copyright (C) 1986, Bob Brodt SCI Program Structure 5 # *** sort a list of numbers in ascending order *** main() { char num[100], rsp[80]; while ( 1 ) { # get the unsorted number list . . . # got 'em, now sort 'em then print 'em sort( num ); i=0; while ( i<100 ) { printf( "%d\n", num[i] ); ++i; } puts( "Got another list to sort?" ); gets( rsp ) if ( rsp[0] != 'Y' ) break; } } sort( numlist ) char numlist[]; { # bubble sort routine . . . } The first thing that strikes the BASIC programmer when he looks at a C program is the absence of line numbers! The C language relies purely on the location of statements within a program to determine the order of program execution. In the above example, notice the presence of the matching left and right curly braces ({ and }). These serve to bind together logical sections of the program. In particular, notice the first "{" (following "main()") and its partner towards the end of the program. These particular matching braces are used to "bind" everything between them to make one functional unit. This functional unit is called a "function" in C. Each function can be thought of as an autonomous entity - everything within the function is SCI Programmers Manual Copyright (C) 1986, Bob Brodt 6 SCI Program Structure accessible only to statements within that function. The name of the function can be found immediately before the first "{", in this case, the function's name is "main". Another function can be found towards the end of the program, its name is "sort". So, in contrast to BASIC, a C program is a collection of these modular functions rather than just a sequential list of instructions. 4. FFFFuuuunnnnccccttttiiiioooonnnnssss Think of functions as a kind of "black box" machine; raw materials, in the form of information, goes into one end of the machine and a final product comes out of the other end. The inner workings of the machine are hidden and we don't really care to know how the machine works, as long as the final product is what we expected from the raw materials supplied. In C, the "raw materials" passed to a function are known as the function's "arguments" and the "final product" is called the function's "return value". C allows you to pass as many arguments to a function as needed, but the function always returns one and only one value. In the section on Variables we will see how a function can be made to _s_e_e_m to return more than one value. To get SCI to execute the statements within a particular function, all you have to do is mention the function's name. In the program fragment shown above, you would type either "main()", or "sort()" at the SCI prompt. The parentheses following a function's name serve two purposes: they distinguish the entity as being the name of a function as opposed to a variable; and they show SCI where the function's arguments start and end. If a function does not require any arguments (as in "main()" above), you still need to supply the left and right parentheses. If a function requires more than one argument, each argument is seperated from the preceding one with a comma (,) like so: func( 23, 15, 34 ) Note that the spaces are optional! SCI Programmers Manual Copyright (C) 1986, Bob Brodt Functions 7 4.1 LLLLiiiibbbbrrrraaaarrrryyyy FFFFuuuunnnnccccttttiiiioooonnnnssss Beyond using it as a rather dumb integer calculator, you can use the SCI shell to test out any valid C statement with support from a large collection of built-in functions. As you work through this tutorial, you will be introduced to many of these, refered to hereafter as "Library Functions" (see the section on Library Functions for more details). You may, if you like, think of the Library Functions as being analogous to BASIC's built-in commands like "PRINT" and "INPUT". Most of the Library Functions are similar to those shipped with "industrial strength" C compilers, so many of the programs you write under SCI should be transportable with some minor changes. 4.1.1 _p_u_t_d_(_) The Library Function "putd()" prints a number, or the results of a calculation on the console screen. Try entering the following commands from the shell: putd(123) putd( 235 + 12370 ) In the first example, the argument passed to "putd()" is the number 123. The function should have printed "123" on the console screen. In the second example, the argument is the sum of 235 and 12370. Note that this calculation is performed first, then the result is passed to "putd()" for printing. Below the numbers that were printed by "putd()" you should have seen a zero printed as well. This zero is the value returned by "putd()" and was printed by the shell. In this case, the return value of a function was not particularly useful. We were more interested in the side-effect of this function, namely the displaying of a number on the screen. 4.1.2 _g_e_t_c_h_a_r_(_) The Library Function "getchar()" waits for a single keyboard key to be pressed, then returns the value (in ASCII) of that key. At the shell prompt, try typing "getchar()", hit a carriage return and then hit the letter 'a' key. You should see the number 97 printed by the shell, the ASCII value in decimal of the character 'a'. Unlike "putd()", this function required no arguments and returned a useful value. SCI Programmers Manual Copyright (C) 1986, Bob Brodt 8 Functions 4.1.3 _p_u_t_s_(_) The function "puts()" is used to print a sequence of characters (known in C jargon as a "string") on the console. A string is represented in C as a bunch of characters enclosed in quotes ("), just as in BASIC. Try the following command, and be careful to type the string exactly as it appears here: puts("hello world\n") Look closely at the string again and notice the backslash (\) just before the letter 'n'. This two-character combination (\n) is standard C shorthand notation for a "newline" character. Newlines have the effect of performing a cariage return plus linefeed on the console. Had we omitted the "\n" from the string, "puts()" would have just printed "hello, world" and left the cursor on the same line, after the "d" in "world". SCI provides other similar shorthand notations, which will be explained in a later section. 5. YYYYoooouuuurrrr FFFFiiiirrrrsssstttt PPPPrrrrooooggggrrrraaaammmm Now it's time to write your first program. If you haven't already done so, read the Editor section of the User's Manual and perform the installation as required for your particular computer. If you are unsuccessful in getting the editor to work properly, you can create the sample programs with your favorite text editor, then start up SCI and load the program file. This will be tedious and time consuming, but it may just give you enough understanding of C to perform the editor installation properly. If all else fails, appeal to the author for help! 5.1 HHHHeeeelllllllloooo aaaaggggaaaaiiiinnnn,,,, wwwwoooorrrrlllldddd!!!! Either using the built-in editor or a seperate text editor, create the following program: hi() { puts("hello, world\n"); } Now, from the shell, type the name of the function, "hi()". You should see the following on your screen: SCI Programmers Manual Copyright (C) 1986, Bob Brodt Your First Program 9 shell> hi() hello world 0 shell> If instead you are rewarded with an error message followed by a question mark, you did something wrong! Hit a carriage return or two to get back to the shell's "shell>" prompt, go back into the editor, fix the mistake and try it again. Whether you realize it or not, this exercise is an important first step for learning a new programming language. It teaches you all of the routine motions you will be going through to write programs and gives you confidence to continue on. 5.2 FFFFaaaahhhhrrrreeeennnnhhhheeeeiiiitttt ttttoooo CCCCeeeellllssssiiiiuuuussss Next, type in the following sample program: fahr(celsius) { return 9 * celsius / 5 + 32; } This is a simple celsius to fahrenheit temperature conversion function. Notice here the symbols for multiplication (*) and division (/) are the same as in most other programming languages. Try executing this function with a few different celsius values. Each time the argument is converted to fahrenheit and is returned to the shell to be printed. As an exercise, modify the program to print the fahrenheit value and return a value of zero! 6. SSSSttttaaaatttteeeemmmmeeeennnnttttssss:::: SSSSiiiimmmmpppplllleeee aaaannnndddd CCCCoooommmmppppoooouuuunnnndddd In C, a "statement" is just what you might expect; an imperative instruction to the computer to perform some calculation. Statements are generally some kind of arithmetic expression followed by a semicolon (or the end of line in SCI) - we have encountered them before. The C language also allows you to group together several of these "simple" statements and treat them as a single "compound" statement. This is done by placing left and right curly SCI Programmers Manual Copyright (C) 1986, Bob Brodt 10 Statements: Simple and Compound braces ({ and }) around the simple statements. Let's look at the example below: {puts("hello ");puts("world\n");} Here, everything within the left and right braces and the braces themselves are treated as a single statement in C. The C language also lets us write the above statement like this: { puts( "hello " ); puts( "world\n" ); } Notice that the program becomes much easier to read when each statement is written on a seperate line. Also notice that we have indented the two simple statements from the braces. Indenting is the accepted way of conveying the intended structure of a program. We are in effect saying that these two lines "belong together" and should be treated as a single unit. The compound statement in the above example was obviously created for demonstration purposes only. If it had been encountered by itself in a real program, the braces would have been superfluous and would not have altered the behavior of the program. However, earlier we encountered an instance where the curly braces were required, namely immediately following a function definition. Later on when we discuss program flow control, we will again sing the praises of compound statements. We will now make just one more point concerning compound statements and the SCI shell. From the shell, type the following two statements: shell> puts("hello "); puts(" world"); shell> {puts("hello "); puts(" world");} In the first instance, you saw that only the word "hello" was printed followed by the shell's "shell>" prompt. This is because the interpreter executes only the first statement it finds in the input line buffer. Since a statement is terminated by a semicolon, the second call to "puts" was never seen. In the second example, the interpreter saw the left curly brace, recognized the entire line as a single statement, and executed both calls to "puts()". SCI Programmers Manual Copyright (C) 1986, Bob Brodt Statements: Simple and Compound 11 6.1 CCCCoooommmmmmmmeeeennnntttt SSSSttttaaaatttteeeemmmmeeeennnnttttssss Comment statements are completely ignored by C and may be used liberally anywhere within a program for documentation purposes. Standard C uses the two-character combinations /* (pronounced "slash-star") and */ to mark the beginning and ending of comment statements: 2 + /* this is a comment */ 2 + 2; The /* and */ need not necessarily be on the same program line, as for example: 2 + 2 + 2; /* this is a comment */ SCI uses the number symbol (#) to introduce comment statements. A comment in SCI begins with a # and ends at the end of the line. Being an interpreter, SCI required that comments appear on a single line, so only a comment start symbol was required. The above example might appear in SCI like this: 2 + 2 + 2; # # this is a comment # Be careful when placing comments because everything to the right of the first # symbol on the line is ignored by SCI. For example, the following comment would not work as expected: 2 + # this is a comment # 2 + 2; 7. EEEExxxxpppprrrreeeessssssssiiiioooonnnnssss Expressions can be thought of as components of a C statement - the values and operators that, when evaluated, yield a result. The most common example that comes to mind are arithmetic expressions: 2 + 3 - 5 SCI Programmers Manual Copyright (C) 1986, Bob Brodt 12 Expressions An expression becomes a statement if we simply tack a semicolon at the end of it, thus: 2 + 3 - 5; 7.1 OOOOppppeeeerrrraaaattttoooorrrrssss Since C provides a plethora of operators, we will not discuss them all in this section but rather introduce them as they become relavent to the discussion. If you have burning desire to discover all of C's operators, see the Appendix. First, we will define some commonly used terms. 7.1.1 _B_i_n_a_r_y__O_p_e_r_a_t_o_r_s The term "binary operator" does not refer to bits and bytes but rather to the class of operators that require two (hence "binary") operands. Some of these you have probably already seen if you are familiar with other programming languages, like the addition (+), subtraction (-), multiplication (*) and division (/) operators. 7.1.2 _U_n_a_r_y__O_p_e_r_a_t_o_r_s Unary operators perform their functions on only one operand. The subtraction symbol (-) is used as a unary operator when it stands in front of a number or a variable, like so: -45 You may also use the plus sign (+) as a unary operator, although it would be superfluous since all numbers are assumed to be positive unless preceeded by a minus sign. C also provides other unary operators that will be discussed later. 7.2 PPPPrrrreeeecccceeeeddddeeeennnncccceeee If you will recall, in your high school algebra class you learned that in an arithmetic expression containing a combination of addition, subtraction, division and multiplication, the division and multiplication are always done before addition and subtraction. That is to say that division and multiplication "take precedence" over addition and subtraction. This property of precedence extends to all operators in the C language, not just the arithmetic operators. You may defeat the normal order of evaluation of an expression by using parentheses, just as in modern algebra: SCI Programmers Manual Copyright (C) 1986, Bob Brodt Expressions 13 (2 + 3) * 5 This will perform the addition first, then the multiplication. You may use as many matched sets of parentheses as necessary to disambiguate the order of evaluation: ( ( (2+3) / 2 ) * 5 ) In fact, it is a good idea to use parentheses liberally whenever you are unsure of operator precedence. 7.3 AAAAssssssssoooocccciiiiaaaattttiiiivvvviiiittttyyyy You also learned (hopefully in the same algebra class) that expressions are always evaluated from left to right. This same rule applys to expressions in C. This property of operators is known as associativity. In C, most of the binary operators are evaluated _f_r_o_m _l_e_f_t _t_o _r_i_g_h_t, while the unary operators are evaluated from _r_i_g_h_t _t_o _l_e_f_t. 7.4 AAAArrrriiiitttthhhhmmmmeeeettttiiiicccc ooooppppeeeerrrraaaattttoooorrrrssss Now we are finally prepared to formally introduce C's arithmetic operators. They are listed here in order of decreasing precedence: * / % multiplication, division and modulo + - addition and subtraction Most of these should already be familiar to you. The modulo operator (%) gives the remainder from the division of the left value by the right value. For example, the result of: 15 % 8 is 7. 7.5 BBBBiiiittttwwwwiiiisssseeee OOOOppppeeeerrrraaaattttoooorrrrssss C also offers these bit-manipulation operators (again listed in decreasing precedence): << >> left and right SHIFT & bitwise AND ^ bitwise exclusive OR | bitwise OR If you have a need to do bit manipulation but are not familiar with the above terms (SHIFT, AND, OR and exclusive OR), you should probably consult a textbook on computer SCI Programmers Manual Copyright (C) 1986, Bob Brodt 14 Expressions programming since this is beyond the scope of this tutorial. We will be learning more about other C operators in later discussions. 8. VVVVaaaarrrriiiiaaaabbbblllleeeessss Previously, we had only eluded to the fact that C does allow you to create named data storage locations (a.k.a. "variables"), now we will formally introduce you to all of C's data types. Except for the pre-defined Library Functions and the editor's system-variables (which are found in SHELL.SCI), all variables must first be made known to the program before they may be used. Unlike BASIC where a variable comes into existance the very first time it is used in a statement, C requires that every variable be formally declared before you may use it within your program. This section will cover the fundamentals of C variable declarations. 8.1 NNNNaaaammmmiiiinnnngggg CCCCoooonnnnvvvveeeennnnttttiiiioooonnnnssss The precise rules governing the naming of variables usually varies from one C compiler to another. The rules for SCI variable names are as follows: 1. a variable name may contain any number of characters from the set of: 1. the letters "a" through "z" and "A" through "Z". 2. the underscore (_). 3. the digits "0" through "9". 2. the first character of a variable must not be a digit (i.e. it must be either a letter or an underscore). 3. the case of a letter is significant, for example: "foobar" is not the same as "Foobar" or "FooBar". 4. a variable name may be as long as you like, but there is a limit of 79 characters per line imposed by the interpreter. SCI Programmers Manual Copyright (C) 1986, Bob Brodt Variables 15 8.2 DDDDaaaattttaaaa TTTTyyyyppppeeeessss The C language supports many different types of variables. The most notable difference between them is the amount of memory storage each one addresses. The least amount of memory a variable can represent depends on the type of computer the program is written for. Typically, this is a byte of information, although some mainframe machines do not have the capability to access memory in smaller than 2 or 4 byte gobbles. Most personal computers, however can access memory one byte at a time and in C, this data type is known as the "char", short for "character". 8.2.1 _C_h_a_r A "char" variable in SCI is one byte long and can represent a number between -128 and +127. In order to make a variable known to the program we must first declare it, so to declare a "char" variable named "foobar" we would write: char foobar; We can also declare more than one variable of the same type on the same line by seperating each with a comma, like so: char foobar, snafu, gurgle; 8.2.2 _I_n_t Another variable type is the "int", short for "integer". Again, the amount of memory an "int" addresses is machine dependent. In SCI, an "int" addresses two bytes of memory, and can represent a number between -32768 and +32767. "Int"s are declared in a manner similar to "char"s: int foobar; int snafu, wowbagger; Standard C also defines other data types such as floating point variables, double precision integer and double precision floating point. You may also define your own data types that are a combination of these primaries (known as "structures"). Unfortunately, these are all not supported by this version of SCI. 8.3 SSSSccccooooppppeeee If you are familiar with BASIC, then you already know that a BASIC program variable is "known" throughout the program - that is, any statement within the program may alter a variable's contents. This "feature" can lead to some very SCI Programmers Manual Copyright (C) 1986, Bob Brodt 16 Variables difficult to find programming bugs. For instance, you may use a variable as a temporary loop counter in one section of the program, only to discover later that you had already decided to use that variable for another purpose and its contents were continually being destroyed. Ideally, we would like to be able to use variable names indiscriminantly in one section of a program without having to worry about whether the variable name is being used in another section of the program. Happily, the C language offers this ability as you will soon see. This concept of limited (or rather "controlled") access to variables is known as "scope". 8.3.1 _G_l_o_b_a_l__V_a_r_i_a_b_l_e_s In C, you may create variables that are known throughout the program, just like in BASIC. Variables that have this property are known as "globals" and just like BASIC, every statement within the program may retrieve and store the value of a global variable. A variable will attain global status if it was declared outside of any curly braces ({ and }) that delimit the body of a function. Here is an example to illustrate: char c; # "c" is a global int i, j; # and so are "i" and "j" a_function() # the first function in the program { . . . } char flag, nyuk; # some more global variables another_function() # another function { . . . } As you can see, C does not care where within a program a global variable is declared as long as the declaration appears outside of any functions. SCI ensures that global variables are always set to zero before the program starts up. This is pretty much standard behavior for most C compilers as well. SCI Programmers Manual Copyright (C) 1986, Bob Brodt Variables 17 In C, functions are also considered to be globals - they are known throughout the program, although they obviously can't be used to store data. 8.3.2 _L_o_c_a_l__V_a_r_i_a_b_l_e_s Variables that are declared inside of the curly braces that mark the beginning and end of a function are known as "local" variables. Locals exist only during the life of the function - that is the variable comes into existence after it has been declared within a function and ceases to exist when the function returns to its caller. See the example below for clarification: char c; # these are global variables int i; a_function() # a function definition { char snafu; # a local variable int x, y; # some more locals . . . x = c; # copy the global to a local } # snafu, x and y cease to exist here! Variable declarations _m_u_s_t appear immediately after a left curly brace; if a declaration appears anywhere else within the body of a function SCI will warn you about a "syntax error". In addition, variables may be declared within _a_n_y compound statement in a function, but the declarations _m_u_s_t appear immediately after the opening brace. Variables declared in this context exist only for the life of the compound statement, i.e. to the matching closing brace. Thus the memory these variables occupy can be re-used within the function. Below is an example to illustrate: SCI Programmers Manual Copyright (C) 1986, Bob Brodt 18 Variables func() { char a; int i; . . . if ( i==0 ) { int j; # declare an "int" in a compound stmt . . # "a", "i" and "j" are all locals here . } # "j" no longer exists here else if ( i==1 ) { char j; # a different "j" than above . . . } } SCI ensures that locals are always zero just after they have been declared. On standard C compilers, the initial contents of locals is unknown, so do not depend on them being zero. 8.3.3 _F_u_n_c_t_i_o_n__A_r_g_u_m_e_n_t_s Function arguments are also considered to be local variables. When a function calls another function and passes it an argument, the argument's contents is copied into a local variable in the called function - the value of the caller's argument is not affected. This is best illustrated with an example: SCI Programmers Manual Copyright (C) 1986, Bob Brodt Variables 19 char c; # the global variable, "c" func1() { char c; # a local may have the same name as a global c = 1; # and this sets the LOCAL variable "c" to 1! func2(c); # now call func2 } func2( x ) char x; { char c; # this "c" is different from func1's "c" x = 3; # this does not affect func1's "c" c = 5; # this does not affect the global "c" } 8.3.4 _S_y_s_t_e_m__G_l_o_b_a_l_s As mentioned earlier, the Library Functions and the editor's configuration variables that are declared in the shell are also globals. These however, are more permanent than program globals. A program's global variables can be zapped into non-existence simply by editing the program and removing the statement that declares them. System globals can not be destroyed since SCI will not allow you to modify the shell program (or any program for that matter) while it is still running. 8.4 LLLLooooccccaaaattttiiiioooonnnn ooooffff VVVVaaaarrrriiiiaaaabbbblllleeeessss At this point it may be useful to discuss where in memory each of these different types of variables is located. Although this depends on the compiler's implementation and the hardware, most C compilers take advantage of some commonly used data structures. 8.4.1 _T_h_e__S_t_a_c_k The stack is simply a chunk of the computer's memory that can only be accessed (read from and written to) indirectly through a machine register known as the "stack pointer". If the CPU does can not provide a stack pointer register, the authors of the C compiler will typically write some subroutines in the machine's language to emulate a hardware stack. Reading and writing to the stack proceeds as follows: before an item is read from the stack, the stack pointer is decremented to point to the previous item in the stack memory. This then, is the item read from the stack; After an item is written into stack memory, the stack pointer is incremented to point to the next item in the stack. Thus, the operation of the stack can be thought of as a stack of pancakes - numbers are piled onto the stack for temporary storage, then removed from the SCI Programmers Manual Copyright (C) 1986, Bob Brodt 20 Variables top as needed. In general, the C language depends very heavily on the stack. Local variables, including function arguments, are piled onto the stack when a function begins, and then removed and discarded when it terminates. As a C statement is executed, the components of the statement (constants, variables, etc.) are "pushed" onto the stack until they are needed. Then, when the statement is evaluated, the components are "poped" off the stack. Most compilers take advantage of the machine's built-in stack (if the CPU happens to have one, as most do), so access to the stack is very efficient. Still, this has become a major point of criticism by opponents of the C language. 8.4.2 _P_r_o_g_r_a_m__a_n_d__D_a_t_a__S_e_g_m_e_n_t_s Global data variables are usually stored in the same section of memory as program code; most 8 and 16 bit CPU's do not provide seperate memory segments for program code and global data. Some minicomputers and most mainframes do provide seperate program code and data memory areas. The machine then limits access to these segments by disabling the program from storing data in the code segment and possibly causing the program to go berserk. Also, the program is limited to accessing only its own global data area and attempts to read or write data outside of this global data segment is a violation. Alas, a microcomputer's operating system is at the mercy of the currently executing program and a careless program has the ability to corrupt the operating system and bring the computer to its knees. 9. CCCCoooonnnnssssttttaaaannnnttttssss You already know about decimal integer constants because we have been using them throughout this tutorial. The C language also allows you to represent numbers in hexadecimal, octal and ASCII. SCI Programmers Manual Copyright (C) 1986, Bob Brodt Constants 21 9.1 HHHHeeeexxxxaaaaddddeeeecccciiiimmmmaaaallll CCCCoooonnnnssssttttaaaannnnttttssss Hexadecimal numbers are distinguished from other number representations and variables by preceding them with a "0x" (zero-"ex"), for example: 0x0 0x1b 0xfa70 are all valid hexadecimal number representations. You may also use an upper case "X" in "0X" and upper case "A" through "F" if you desire. 9.2 OOOOccccttttaaaallll CCCCoooonnnnssssttttaaaannnnttttssss Octal numbers are distinguished by preceding them with a zero. These are all valid octal numbers: 00 033 0175160 9.3 AAAASSSSCCCCIIIIIIII CCCChhhhaaaarrrraaaacccctttteeeerrrr CCCCoooonnnnssssttttaaaannnnttttssss The numeric value of ASCII characters can be represented by surrounding the ASCII character in apostrophes, like this: 'A' is equivalent to decimal 65 ' ' is a space and is equivalent to decimal 32 Certain non-printing ASCII characters can also be conveniently represented as character constants. By preceeding certain lower case letters with a backslash character ("\"), the two-character combination can be used to represent a single one byte value. One of these you already know as the "newline" character, '\n'. Here is a complete list of these: '\b' "backspace", equivalent to decimal 8. '\r' "carriage return", equivalent to 13. '\n' "newline", equivalent to 10. '\f' "formfeed", equivalent to 12. '\t' "tab", equivalent to 9. In addition, you can represent any ASCII character as a character constant using its octal equivalent preceded by a backslash. The only restriction here is that the octal SCI Programmers Manual Copyright (C) 1986, Bob Brodt 22 Constants representation must be exactly 3 octal digits. For example: '\033' is an ASCII "escape" character '\101' is an ASCII "A", equivalent to 65 '\377' is equivalent to -1 and so on - you get the idea. 9.4 SSSSttttrrrriiiinnnngggg CCCCoooonnnnssssttttaaaannnnttttssss Finally, another type of constant you have already been using, is the "string" constant - a bunch of ASCII characters surrounded by quotes, for example: "this is a string\n" A string always ends with a zero byte, thus the amount of memory a string takes up is equal to the number of characters you can count in the string plus one. In the example above, the string requires 18 bytes of storage (realize that the "\n" sequence is a single character - the "newline"!). String constants have an interesting numeric equivalent - it is an address in the computer's memory where the ASCII characters in the string can be found by functions that are equiped to deal with them. For instance, the Library Function "puts" expects its parameter to be an address in memory where ASCII character can be found and sequentially printed out to the console screen. If you tried to find out a string constant's numeric value from the shell by typing: shell> "hello?" 4380 shell> "another string..." 4380 shell> "what the?" 4380 > you would be surprised to find that they all have the same address - how could this be? Actually, all the string constants in the above examples do have the same address. Recall that the shell reads a line of input from the console and hands it off to the interpreter for evaluation. Since the strings all get read into the same line buffer by the SCI Programmers Manual Copyright (C) 1986, Bob Brodt Constants 23 shell, they all have the same address, namely the shell's input line buffer. We will discuss strings in more detail in the section on arrays and pointers. 10. AAAAssssssssiiiiggggnnnnmmmmeeeennnntttt OOOOppppeeeerrrraaaattttoooorrrr In C, we assign values to variables using the "assignment" operator, "=". Do not confuse the assignment operator (a single equal sign) with the "is equal to" relational operator (two consecutive equal signs), which we will discuss later. Although C will allow you to do this under certain conditions, you will get unexpected results. The expression: a = (b + 1) * 2; is read as: take the results of the calculation of (b + 1) * 2 and assign it to the variable "a". Note that there may be only one variable to the left of the equal sign. You can if you like, string several of these assignments together like this: a = flg = x = (b + 1) * 2; Note that even here there is always only one variable to the left of each equal sign. This statement is evaluated like so: take the results of the calculation of (b + 1) * 2 and assign it to the variable "x", then assign the same number to the variable "flg", and then to "a". This implies that the assignment operator is evaluated from _r_i_g_h_t-_t_o-_l_e_f_t, instead of the usual left-to-right. In fact it is the only binary operator supported by SCI that exhibits this peculiar behavior. This feature is most useful when initializing several variables, like so: lettercnt = digitcnt = punctcnt = 0; which would set all of the variables to zero. The assignment operator has the lowest precedence (it is performed last in an expression) of all the C operators, except for the "comma" operator (see below). SCI Programmers Manual Copyright (C) 1986, Bob Brodt 24 Assignment Operator 10.1 LLLLvvvvaaaalllluuuueeeessss aaaannnndddd RRRRvvvvaaaalllluuuueeeessss It should be intuitively obvious that any attempt to store a value in what we know as a C constant is illegal. In other words, you would never attempt to say, store the number 3 in place of the number 5: 5 = 3; The same holds true for string constants; you may not store another string in an existing string constant: "hello" = "world"; These types of data (constants) are collectively known as "rvalues" (pronounced "are-values"). The term rvalue stems from the fact that they may only be used on the right-hand- side of an assignment operator. On the other hand, variables do allow numbers to be stored and retrieved from them. This category of data is known as "lvalues" (pronounced "ell-values") because they may be used on the left-hand-side of an eual sign. We will encounter lvalues and rvalues again in a later discussion. 11. CCCCoooommmmmmmmaaaa OOOOppppeeeerrrraaaattttoooorrrr In _s_t_a_n_d_a_r_d C the punctuation character "," (comma) is considered to be an operator, although it does nothing more except insure that sub-expressions within a statement will be evaluated in order from left to right. This operator has the lowest priority of all. It is useful for when you want to do more than one thing in a statement, like the following: ++a, b=12, c=b+a; Of course, we could also have written the above statement as: SCI Programmers Manual Copyright (C) 1986, Bob Brodt Comma Operator 25 { ++a; b=12; c=b+a; } but this would not have been as concise as the first form. Commas are also used to seperate variable names in a data declaration as you have already seen, and to seperate arguments in function calls. NOTE: SCI does not support the use of the comma operator anywhere outside the context of variable seperators or function call argument seperators, as in the first example above. Any attempt to do so will result in a "syntax error" message from the interpreter. 12. FFFFlllloooowwww CCCCoooonnnnttttrrrroooollll The previous sections have dealt only with data elements (variables and constants) and with evaluating arithmetic combinations of these. C would be a poor language indeed if it only allowed a programmer to evaluate a sequential list of arithmetic expressions without giving him the opportunity to act on the results of these calculations. This section will introduce you to C's program control structures, also known as "flow control" structures. Most of these have constructs that should be familiar to all you BASIC hackers: the conditional ("if-else"), looping ("while" and "for") and program control switching ("switch"). 12.1 iiiiffff aaaannnndddd iiiiffff----eeeellllsssseeee The most fundamental of the flow control constructs is the "if". This allows you to perform a statement (or group of statements if we talk about a compound statement) "if" the given condition is true. In C, a condition is considered to be true when the value of an expression is non-zero, that is either a positive or negative number. It follows then, that an expression that evaluates to zero is considered to be a false condition. We write an "if" conditional in C like this: SCI Programmers Manual Copyright (C) 1986, Bob Brodt 26 Flow Control if ( <expression> ) <statement> We will be using the angle brackets (<>) to represent familiar C concepts so that you will be able to more easily identify the relavent components: here "<expression>" is any valid C expression, like "var-5" or "x + 10"; and "<statement>" may be either a simple or compound statement - but, more about statements later. The relavent components in the "if" statement are: obviously the "if" word which identifies this flow control construct; a left parenthesis followed by an expression followed by a right parenthesis; then a C statement. Now a few words about syntactics: 1. The "if" must be in lower case letters, most C compilers will usually not accept "If", or "IF" or "iF" (and neither will SCI!). 2. The matched left and right parentheses must be included, and SCI requires that the "if", the left parenthesis, the <expression> and the right parenthesis appear on the same line in the program. 3. The <statement> may be either a simple statement or a compound statement. NOTE: SCI requires that the "if" and the parentheses appear on the same line in your program text, but the <statement> may appear on the following line. This is only a restriction of the SCI interpreter - the standard C language lets you put as much horizontal and vertical "distance" between elements of an "if" statement as you like. The "if" flow control construct behaves as follows: 1. The <expression> is evaluated. 2. If the result of <expression> is true (a non-zero value) then the <statement> is executed. 3. If the result is false (zero) the <statement> is skipped and program control passes to the next statement. For example: SCI Programmers Manual Copyright (C) 1986, Bob Brodt Flow Control 27 if ( 2 + 2 ) a = 5; would always set the variable "a" to 5 because the <expression>, which evaluates to 4 in this case, is always non-zero. And the <statement> in this example: if ( 0 ) a = 5; would never be reached because the <expression> is always false. As a last example, look at this: if ( a = b + c ) b = b + 1; Here the value of the <expression> depends on the results of the addition of "b" and "c", which we have no way of knowing just by looking at the example out of context. As a side- effect, the result of the addition is stored in the variable "a". It is very important that you realize that the equal sign in "a = b + c" is not making a comparison between the value of "a" and "b + c", as you might assume if you were looking at a similar statement in BASIC. In other words, we are _n_o_t saying "if a is equal to the sum of b and c". 12.1.1 _M_o_r_e__A_b_o_u_t__S_t_a_t_e_m_e_n_t_s Earlier we promised to tell you more about the concept of C statements. A statement, as you already know, can be either an expression such as "a = a + 5" followed by a semicolon (don't forget the semicolon!) or it may be a group of these simple statements surrounded by left and right curly braces, like this: { a = b + c; b = b + 1; } or this: { a = b + c; b = b + 1; } or a compound statement within a compound statement as in this example: SCI Programmers Manual Copyright (C) 1986, Bob Brodt 28 Flow Control { { a = b; b = 1; } a = b + c; b = b + 1; } We now expand our definition of a C statement to include the "if" and later all of the other flow control constructs as well. In other words, the template for an "if" we showed you before: if ( <expression> ) <statment> can be thought of as a single unit and used wherever a <statement> is used. Now we can write multiple "nested if's" like this: if ( a + 5 ) if ( b - 1 ) c = 0; which would be "read" by the computer as follows: 1. if "a + 5" is true (non-zero) then go to step 2 otherwise go to step 3. 2. if "b - 1" is true, then assign 0 to "c". 3. go on to the next statement in the program. The "if" statement also has an optional "else" clause, which looks like this: if ( <expression> ) <statement> else <statement> Notice that the "else" keyword must be in lower case letters also. Again, the first <statement>, the "else", and the second <statement> may be on seperate lines of program text or they may all be on the same line. The "if-else" statement, as it is called, is read as follows: 1. if the <expression> is true, then execute the first <statement> and then go to step 3. 2. else, execute the second <statement> and then go to step 3. SCI Programmers Manual Copyright (C) 1986, Bob Brodt Flow Control 29 3. go on to the next statement in the program. The C language allows you to nest "if-else" statements as deeply as you wish, for example: if ( a + 3 ) if ( b + 5 ) if ( c + 7 ) d = 0; else d = 1; else d = 2; else d = 3; It should be obvious from the way the statements were indented how each "else" matches its "if". As a matter of definition, an "else" clause matches the nearest preceding "if" clause. If there are more "else's" than "if's" in a program, then this is an error condition and you will be warned by SCI. If you are ever unsure how nested "if-else" combinations will match up, you can always use curly braces to bind them together the way you want: if ( a + 3 ) { if ( b + 5 ) { if ( c + 7 ) d = 0; else d = 1; } else { d = 2; } } else { d = 3; } You can also use another "if-else" statement in the "else" clause of a preceding "if-else", for example: SCI Programmers Manual Copyright (C) 1986, Bob Brodt 30 Flow Control if ( a + 3 ) d = 1; else { if ( a + 4 ) d = 2; else { if ( a + 5 ) d = 3; else d = 4; } } Because it is a matter of personal style, there are no hard and fast rules to follow when indenting program statements like this. However, the above example is more commonly written like this: if ( a + 3 ) d = 1; else if ( a + 4 ) d = 2; else if ( a + 5 ) d = 3; else d = 4; This saves you from running off the right edge of the screen when writing very deeply nested "if-else's" and it looks very much like a multi-path switch (an "ON GOTO" statement in BASIC). 12.1.2 _R_e_l_a_t_i_o_n_a_l__O_p_e_r_a_t_o_r_s Sometimes it is necessary to change the program flow depending on whether a variable is equal to a certain value. The C language has this ability to test equality of two expressions using a set of operators (similar to the addition, multiplication, assignment, etc.) known as the "relational operators". If we wanted to know if a variable were equal to a certain value, for instance we could say: a == 5 The "==" (which is read as: "is equal to") operator compares the two items to the left and right of it and leaves a value of one if they are equal, zero if they are unequal. So the SCI Programmers Manual Copyright (C) 1986, Bob Brodt Flow Control 31 value of this expression would be one if "a" is equal to 5 and zero otherwise. Thus, the statement: if ( a == 5 ) b = 0; would set "b" to zero only if "a" equals 5. We promised you a set of these operators, so here they are: _o_p_e_r_a_t_o_r: _r_e_a_d _a_s: == is equal to != is not equal to < is less than > is greater than <= is less than or equal to >= is greater than or equal to Notice that there may be no spaces between the two equal signs (=) in the "is equal to" operator nor between the exclamation point (!) and the equal in the "is not equal to". If there is a space between them, they will be assumed to be two seperate operators and you will get a "syntax error" message from SCI. Needless to say, the same goes for the <= and >=. 12.1.3 _L_o_g_i_c_a_l__O_p_e_r_a_t_o_r_s The C language also allows you to combine groups of relational expressions with "and" and "or" clauses. For example, given two sets of conditions you can determine if both are true, or if at least one is true. These "and" and "or" clauses are known as the "logical operators" in C: _o_p_e_r_a_t_o_r: _r_e_a_d _a_s: && and || or Notice that there may be no spaces between the two ampersands (&) and vertical bars (|). The expression: a==5 && b==3 will be true only if "a" is equal to 5 _a_n_d "b" is equal to 3. SCI Programmers Manual Copyright (C) 1986, Bob Brodt 32 Flow Control a<0 || 10<a In this example the expression is true if "a" is less than 0 _o_r greater than 10 (do you see that 10<a and a>10 are identical?). One final note about the logical operators: standard C stops evaluating an expression that contains logical operators after the truth or falsehood of the expression is known. For example: a==b && c==d Assume that "a" is not equal to "b". Standard C would not even bother checking the relation "c==d" because the "and" clause requires both expressions to the left and right of the "&&" to be true - if one of the components is false, the entire expression is false. So, since the first component encountered ("a==b") was found to be false, the truth or falsehood of the entire expression is already known and there is no need to evaluate "c==d". NOTE: SCI is not as smart as a standard C compiler and blindly evaluates every sub-expression in a logical expression. This can lead to some very hard to find errors if for example you alter a variable in one of the subexpressions of a logical operation - sorry folks! 12.1.4 _P_r_e_c_e_d_e_n_c_e__a_n_d__A_s_s_o_c_i_a_t_i_v_i_t_y You can combine as many "and" and "or" clauses as necessary: a==5 || b==3 && c==4 This expression will be true under one of two conditions: 1) b is equal to 3 AND c is equal to 4, or 2) a is equal to 5. This example shows a very important property of logical operators that we have already encountered in the discussion of the arithmetic operators (+, -, *, etc.), namely precedence. In C, the && operator takes precedence (is performed before) the || operator. As with the arithmetic operators the logical operators are performed from left to right. So, if we have more than one operator of the same precedence in an expression: SCI Programmers Manual Copyright (C) 1986, Bob Brodt Flow Control 33 a==5 && b==5 && c==5 we know that C performs the tests for equality from left to right. Although we haven't come right out and said it before, it should be obvious from the examples that the relational operators have higher precedence than the logical operators. Specifically, >, >=, < and <= have higher precedence than == and != which have higher precedence than && which has a higher precedence than ||. Please refer to the Language Summary section of the User's Manual for a complete list of C operators and their order of precedence. Furthermore, relational operators associate from left to right, although it is hardly ever necessary to use more than one consecutive relational: a == 5 != 1 Notice in this example that "a == 5" is performed first which would result in either a zero or a one. Examine the following expression closely: a < 5 == b < 5 This expresion will be true if "a" and "b" are _b_o_t_h _l_e_s_s _t_h_a_n _5 or "a" and "b" are _b_o_t_h _e_q_u_a_l _t_o _o_r _g_r_e_a_t_e_r _t_h_a_n _5. As always, whenever you are in doubt about the associativity or precedence of operators, either use parentheses to bind operands and operators together, or consult the table of operators in the User's Manual. 12.1.5 _E_x_a_m_p_l_e_s Finally armed with these new facts about C, we are ready to try some practical examples using the SCI interpreter. Enter the following program using the SCI editor: SCI Programmers Manual Copyright (C) 1986, Bob Brodt 34 Flow Control convert(n) { char c; puts("to decimal (d), hex (x) or octal (o) ?"); c=getchar(); if(c=='d') putd(n); else if(c=='x') putx(n); else if(c=='o') puto(n); else puts("what?0); } As you can probably tell, this is a number conversion routine. It asks the operator whether to convert the number passed to it ("n") to decimal, hexadecimal or octal. Now at the shell prompt, type: shell> convert( 255 ) to decimal (d), hex (x) or octal (o) ?x 0xff 0 shell> If your screen did not look like the above, there is something wrong - fix it up and try again. Now modify the program to accept either lower or upper case d's, x's and o's (hint: use the || operator). 12.2 wwwwhhhhiiiilllleeee One of the things that makes a computer such a powerful tool is its tireless ability to perform repetitive tasks. This is why every programming language has some sort of "looping" flow control. The C language offers three types of loop constucts: the "while", "for" and "do-while". This version of SCI only supports the "while" and "for". The "while" flow control looks something like this: while ( <expression> ) <statement> Notice that its structure is very similar to the "if" construct and all of the syntactical rules apply as well. SCI Programmers Manual Copyright (C) 1986, Bob Brodt Flow Control 35 This statement is executed as follows: 1. evaluate <expression> and if it is false, go to step 4. 2. execute <statement>. 3. go to step 1 4. go on to the next statement in the program. As with the "if" statement, <expression> is considered to be true if it evaluates to non-zero, and false if it is zero. Let's look at an example: loop() { int a; a = 10; while ( a > 0 ) { putd( a ); a = a - 1; } puts("all done\n"); } This loop will get executed exactly 10 times - each time the variable "a" is decremented by one until it equals zero. When "a" reaches zero, the expression "a > 0" will become false and program execution will continue with the next statement in the program, namely the statement puts("all done\\n");. There is also a more direct method of breaking out of a "while" loop without having to wait until control returns back to the <expression> evaluation and testing. Using a "break" statement, you can directly jump out of a "while" and continue with the next statement in the program. The following demonstrates the use of a "break": SCI Programmers Manual Copyright (C) 1986, Bob Brodt 36 Flow Control loop() { int i, q; i = 12345; while (1) { q = i/10; putchar( i-q*10+'0' ); if ( q==0 ) break; i = q; } } Since the <expression> is always true (1 is always non- zero), this loop would be repeated until the cows came home. The "if" statement within the loop will break out of the loop when the quotient from the division results in zero. We leave as an exercise for the student to figure out what this little program does. 12.3 ffffoooorrrr The "for" looping construct is similar to the "while". Its format is as follows: for ( <expression> ; <expression> ; <expression> ) <statement> The syntactical requirements of the "for" construct are similiar to those of the "while" - the "for" and the parentheses and everything between them must be on the same program line. Also, the three <expression>'s inside the parentheses must be seperated from each other by two semicolons as shown. Actually, the "for" is simply a method of clearly presenting to the reader the most commonly needed elements relavent to a program loop: an "initialization" part, a "loop test" part and an "iteration" part. These three elements are clearly identifyable, and correspond to (reading from left to right) the three <expressions> within the parentheses. A "for" statement would be executed as follows: SCI Programmers Manual Copyright (C) 1986, Bob Brodt Flow Control 37 1. evaluate the first <expression>, disregard the result. 2. evaluate the second <expression> and if it is false, go to step 5. 3. execute <statement>. 4. evaluate the third <expression> and go to step 2. 5. go on to the next statement in the program. We could have written the first example given for the "while" using a "for" statement: loop() { int a; for ( a = 10; a > 0; a = a - 1 ) putd( a ); } The "break" statement may also be used to break out of a "for" loop. One last interesting feature of the "for" is that any or all of the three <expressions> may be missing. If the second <expression> is missing, the "loop test" will always evaluate to true. Thus, the second example of the "while" loop above could have been written: loop() { int i, q; for ( i=12345; ; q!=0 ) { q = i/10; putchar( i-q*10+'0' ); i = q; } } Here, the three <expressions> are: 1. i=12345 2. the second <expression> is missing! 3. q!=0 SCI Programmers Manual Copyright (C) 1986, Bob Brodt 38 Flow Control And the most efficient way of writing a "forever" loop is: for ( ;; ) puts( "hello, world\n" ); 12.4 sssswwwwiiiittttcccchhhh The last and most complex flow control we will examine is the multi-path "switch" statement. The "switch" is similar to BASIC's "ON GOTO" statement. Here is the template of a "switch" statement: switch ( <expression> ) { case <constant expression> : <statement> case <constant expression> : <statement> . . . case <constant expression> : <statement> default : <statement> } Again, the word "switch", the left parenthesis, the <expression> and the right parenthesis must be on the same program source line. The matching left and right curly brace at the beginning and end of the switch are not actually required but are necessary as you will soon see. The words "case" and "default" are only meaningful within the context of a "switch" statement. There may be any number of "case <constant expression> :" sequences but only one "default :". The <constant expression>'s are simply <expression>'s that contain only constants (no variables!). So, the following would all be examples of <constant expression>'s: 2 + 2 25 * (365 / 7) 37/12 > 10 NOTE: Standard C requires that only <constant expression>'s follow a "case", however SCI allows you to use any valid SCI Programmers Manual Copyright (C) 1986, Bob Brodt Flow Control 39 <expression> as an added bonus. Keep this fact in mind when writing programs that will eventually be transported to standard C! The "switch" statement behaves as follows: 1 evaluate the <expression>. 2 compare the results of <expression> to each of the <constant expression>'s after the "case's" sequentially from top to bottom. 3 if the value of <expression> matches one of the <constant expression>'s, continue program execution with the statement immediately following the colon. All other "case" and <constant expressions> are ignored. 4 if none of the <constant expression>'s match <expression>, jump to the <statement> immediately following the word "default" 5 if a "break" statement is encountered, jump to the end of the "switch" statement (the <statement> immediately following the }). Although this seems complicated at first, a "switch" is really just a multi-way program jump. It allows you to jump to anywhere within a statement, based on the value of an expression. Here's an example of a switch: SCI Programmers Manual Copyright (C) 1986, Bob Brodt 40 Flow Control convert(n) { char c; puts("to decimal (d), hex (x) or octal (o) ?"); switch ( getchar() ) { case 'd': putd(n); break; case 'x': putx(n); break; case 'o': puto(n); break; default: puts("what?\n"); } } NOTE: Standard C allows you to place the "default" word anywhere within the "switch", and program control will jump there only after all of the "case <constant expression> :"'s have been checked and no match found. Here again, SCI dares to be different! If a "default" is encountered before a matching "case", the program continues with the <statement> following the "default". Therefore, it is a good idea to always place your "default" statements at the end of the "switch". 13. AAAArrrrrrrraaaayyyyssss An array in C is a block of contiguous memory locations (meaning they are located "one after the other" in memory) that all have the same type ("char" or "int") and can be accessed individually. These individual data items in an array are known as the array's "elements". You have already used arrays earlier, in your very first C program, namely the sequence of ASCII characters in the string "hello, world\n". The array's elements are the ASCII characters 'h', 'e', 'l', etc. This array however, could not be used to store any information other than that sequence of SCI Programmers Manual Copyright (C) 1986, Bob Brodt Arrays 41 characters, just like the integer constant 5 let's say, can not be used to store a different number. In this section we will show you how to create and use arrays for data storage and retrieval. Arrays are declared in a similar fashion as simple variables, but following the array's name you must indicate how many elements the array will have. The size of an array is constant, once it has been declared it can not be changed. Below is an example of an array declaration char vartable[15], macnam[100]; This statement declares two arrays that have 15 and 100 elements respectively. The square brackets ([ and ]) identifies the variable as being an array, they are also required when you wish to access one of the array's elements: c = vartable[ 5 ]; In C, array elements are counted from zero instead of one, so the above statement takes the _s_i_x_t_h element of "vartable" and stores it in the variable "c". To access the _f_i_r_s_t element of the array, we would write: vartable[0] = 35; Consequently, the _l_a_s_t element of the array would be "vartable[14]" and _n_o_t "vartable[15]". In fact, if you did attempt to store a number in "vartable[15]", you would overwrite some unknown location in memory that was already being used as storage for another variable or, worse yet, that was part of your program code. The results of overrunning a C array like this are unpredictable and are dependent on the environment the C program is running in (the type of machine, the C compiler used, etc.) When the name of an array is used by itself without the square brackets as in: i = vartable + 5; it is taken to be a pointer to the first element of the array. In the example above, "i" would be assigned the _m_e_m_o_r_y _l_o_c_a_t_i_o_n of the sixth element in the array, and _n_o_t _t_h_e _v_a_l_u_e of this memory location. SCI Programmers Manual Copyright (C) 1986, Bob Brodt 42 Arrays The Library Function "gets()" reads a line of input from the console keyboard and places the characters at the address pointed to by its argument. This function waits for the user to hit a carriage return before it returns to the caller. The input line is always terminated with a zero byte by "gets()" and the carriage return is stripped out. This makes it suitable for printing by its partner, "puts()". Try the following program: greet() { char name[ 80 ]; puts("hello, what's your name? "); gets( name ); puts("nice to meet you, "); puts( name ); puts(". Have a nice day!\n"); } 14. PPPPooooiiiinnnntttteeeerrrrssss The last example above leads us directly into our next discussion. In C, you have the ability to access a memory location by "pointing" at it with a variable. This type of variable is known as a "pointer" in C. NOTE: The number of bytes of storage a pointer variable needs depends on the environment the C program is running in. All 8-bit personal computers (8080, z80, 6502, etc.) use 2 bytes for pointer variables. The IBM-PC which has an 8088 CPU may use either 2 or 4 bytes for pointer variables, depending on the C compiler used. SCI always uses 2 bytes. C knows about the type of variable being pointed at from the pointer's declaration: char *char_pointer; int *int_pointer; Above we declared two variables, "char_pointer" and "int_pointer". The asterisk (*) in the declaration statement identifies the variables as being pointers. the "char" and "int" keywords define the type of variable that SCI Programmers Manual Copyright (C) 1986, Bob Brodt Pointers 43 the pointer points to. To illustrate: i = *int_pointer; This would retrieve the _i_n_t_e_g_e_r (two bytes in SCI) found at the memory location addressed by the _c_o_n_t_e_n_t_s of "int_pointer", and store it in "i". In this example, we can't tell what will be stored in "i" because we don't know what the contents of "int_pointer" is. Recall from an earlier discussion that SCI always initializes its variables to zero so in this case, "i" would contain the two bytes found in memory locations zero and one. Unless you know what is in memory locations zero and one, this information is not very useful. The power of pointers lies in the fact that they can be made to point at an array, like this: char vartable[15], *cp; cp = vartable; c = *cp; cp = cp+1; d = *cp; cp = cp+1; e = *cp; Here we have set the character pointer "cp" to point at the first element of the array "vartable", and assigned the _c_o_n_t_e_n_t_s of this first element to the variable "c" just by letting "cp" point at it. By simply adding one to "cp" we have made it to point at the next element in "vartable". 14.1 LLLLvvvvaaaalllluuuueeeessss aaaannnndddd RRRRvvvvaaaalllluuuueeeessss RRRReeeevvvviiiissssiiiitttteeeedddd Pointers are useful because they can be changed (bent?) to address any location in memory whereas arrays are fixed and always point to their first element. For example, if your tried to do this: char a[10]; a = a + 1; you would get an error message. The C language _d_o_e_s _n_o_t _a_l_l_o_w _y_o_u _t_o _c_h_a_n_g_e _t_h_e _v_a_l_u_e _o_f _a_n _a_r_r_a_y _v_a_r_i_a_b_l_e. If it did there might be the possibility of your program "forgetting" where the data in the array is located. Thus SCI Programmers Manual Copyright (C) 1986, Bob Brodt 44 Pointers arrays can be put in the same category as strings and integer constants, namely "rvalues" (see an earlier discussion on Lvalues and Rvalues). The attempt to change the array variable "a" in the example above would therefore reward you with a "need an lvalue" error message from the SCI interpreter. Pointers on the other hand are more analagous to variables - they can be modified and are therefore considered to be "lvalues". 14.2 PPPPooooiiiinnnntttteeeerrrr OOOOppppeeeerrrraaaattttoooorrrr The asterisk in the examples above is known as the "pointer" operator. This is a unary operator and is used in the same manner as the negation (-) unary operator. The pointer operation tells C to treat its associated pointer variable as a memory address and to store or retrieve the data item at that address. Note that the operand of a pointer operator must have been declared as a pointer, or C will complain. This is in your own best interest because we humans tend to forget little details like this. For example, if you wrote: char c, d; d = *c; you would get a "not a pointer" error message from SCI. Now, try typing in this little program using the SCI editor: prints(s) char *s; { while(*s) { putchar(*s); s=s+1; } } The Library Function "putchar()" accepts a single argument and prints the ASCII representation of this argument to the console screen. This program takes a pointer to a character string as an argument. Then, while the character being SCI Programmers Manual Copyright (C) 1986, Bob Brodt Pointers 45 pointed at is non-zero, "putchar()" prints the character onto the console. The pointer is then incremented (s=s+1) so that it points at the next character in the string. Now execute the following command from the shell: shell> prints("hello, world\n"); This should have resulted in the words "hello, world" printed on the console. Except for the fact that pointers may be changed and arrays may not, C treats both of them identically. For example if we have the following two data declarations: char c, ca[ 10 ]; char *cp; we can choose to view the array variable "ca" as a pointer, and the pointer variable "cp" as an array in our programs, like so: c = *ca; cp[ 5 ] = c; So, we could have written the sample program from before like this: prints(s) char *s; { int i; for ( i=0; s[i]; i=i+1 ) putchar( s[i] ); } This would have left the pointer argument "s" unaltered when the "for" loop was finished. This is sometimes necessary, as in the following example: SCI Programmers Manual Copyright (C) 1986, Bob Brodt 46 Pointers prints10(s) char *s; { int i, j; for ( j=0; j<10; j=j+1 ) { for ( i=0; s[i]; i=i+1 ) putchar( s[i] ); } } Here the string passed to "prints10()" is printed on the console ten times. If we had incremented the character pointer "s" instead of using the index "i", then the second time through the outer loop would have started with "s" pointing to the character after the end of the string. 14.2.1 _P_o_i_n_t_e_r__E_x_p_r_e_s_s_i_o_n_s We stated that C knows the type of data item a pointer is pointing to ("char" or "int"). Assume that we have a pointer variable named "ip", then in this example: i = *(ip + 5); we are trying to retrieve the data item pointed at by "ip" and offset by 5. If "ip" was a pointer to an "int", we are retrieving the sixth 2-byte integer of the array pointed at by "ip". Physically, this would be the eleventh and twelfth bytes of the array. Notice that this expression is functionally equivalent to: i = ip[ 5 ]; C allows you to perform certain mathematical and relational operations on pointers, specifically: 1. you may add a constant or an expression that evaluates to a constant to a pointer. 2. you may subtract two pointers, but only if they point to the same type of data. For instance, you may not subtract a "char" pointer from an "int" pointer. The result is the number of elements between the two pointers. 3. you may compare two pointers using the relational operators (==, !=, < <= > and >=). SCI Programmers Manual Copyright (C) 1986, Bob Brodt Pointers 47 All other mathematical operations on pointers are not allowed. On occasion you may wish to use integer constants as pointers, like this: char c; c = *0x0100; When a constant is used as the target of a pointer operation, SCI will access the two bytes (an "int") at the location specified by the value of the constant, in this case at location 100 hex. You can get as creative as you want when using constants as pointers: char c, offs; c = *((0x101 + offs) * 2); Note that SCI will allow you to use variables that have not been declared as pointers in pointer expressions if they are enclosed in parentheses as shown above. NOTE: Standard C does not allow you to use constant expressions as pointers, any attempt to do so will usually result in an error message. 14.2.2 _p_r_i_n_t_f_(_) C functions do not intrinsically know whether their arguments are printable strings or just an array or numbers, like BASIC does. Therefore there are no C functions that are analogous to BASIC's PRINT statement which prints either a number or a string depending on its argument. The standard C Library Function "printf()", however, performs a print operation similar to BASIC's PRINT statement. "Printf", which is read as "print-eff", stands for "print formatted". It is the most unusual standard C function because it accepts a variable number of arguments, depending on the contents of its first argument. The first argument to "printf" is a string of characters and is known as the "control string". "Printf" works like this: it scans through the characters in the control string and prints them out on the console screen; if a percent symbol (%) is encountered in the control string, the character following the "%" determines how printf's next _a_r_g_u_m_e_n_t will be SCI Programmers Manual Copyright (C) 1986, Bob Brodt 48 Pointers processed. For example, if the character following the "%" is a lower case letter "s", the next argument is assumed to be a character string and is printed out to the console instead of the "%s" character combination. A "%d" combination in the control string takes its next argument to be an integer and prints its decimal value. "Printf" keeps track of which arguments have already been printed by a "%- letter" combination, so that the next "%-letter" combination affects the next argument in the argument list. For example, try the following command from the shell: printf("%s, did you know %d*%d is %d?0,"Bob",376,49,376*49) The "%s" conversion treats the second argument, "Bob" as a string (which it is!) and prints it; the first "%d" grabs the next argument in the list (376) and prints it as a decimal number; the second "%d" prints 49 as a decimal number; finally, the third "%d" takes the product of 376 and 49 and prints it as a decimal number. What should have appeared on your screen is: Bob, did you know 376*49 is 18424? "Printf" also recognizes these other conversion codes: %x prints its argument as a hexadecimal number. The characters "0x" do not appear in the printed number; you must add them if needed like so: printf( "%d = 0x%x0, 376, 376 ) %o prints its argument as an octal number. %c prints its argument as an ASCII character. 14.3 AAAAddddddddrrrreeeessssssss OOOOppppeeeerrrraaaattttoooorrrr In an earlier discussion about the scope of variables, we said when a variable is passed to a function, the called function creates a clone of the caller's variable and copies its contents into this local variable. This way the function can not alter the contents of the caller's variable. How then can we have a function alter the contents of our local variables if it becomes necessary? There are several options open to us: 1) write the function so that it returns a value which we can then assign to our local variable, 2) have the function put the value into a SCI Programmers Manual Copyright (C) 1986, Bob Brodt Pointers 49 global variable which we can access, or 3) pass the function the address of our local variable. The first option only allows the function to pass back one piece of information, thus limiting its usefulness. Option 2 is an acceptable method but forces the function to become more dependent upon the entire program structure. This is fine for application-specific functions, but severly restricts the modularity of general purpose functions. The accepted method is to pass the function the address of our variable using the "address operator", "&", like so: prog() { char c; func( &c ); # call "func()", pass the address of "c" } func( ptr ) char *ptr; { *ptr = 12; } The ampersand (&) as used above is a unary operator that tells C we want to use the _a_d_d_r_e_s_s of the associated variable instead of its _c_o_n_t_e_n_t_s. _I_n _o_t_h_e_r _w_o_r_d_s, _t_h_e _a_d_d_r_e_s_s _o_p_e_r_a_t_o_r _y_i_e_l_d_s _a _p_o_i_n_t_e_r to its associated operand. That's why we declared the argument to the function "func()" as a pointer to a "char". In the above example then, the local variable "c" in "prog()" would have been set to 12 after the call to "func()". Extreme caution must be exercised when passing pointers to variables like this. If we had instead declared "ptr" as a pointer to an "int" (int *ptr;) in the function, "func()", then the assignment "*ptr = 12;" would have destroyed the memory location following "c" and possibly caused the program to crash. Note that you may only use the address operator on lvalues; this means only simple variables and pointers. If you try to use the address operator on a constant, a string or an array you will evoke a "need an lvalue" error from SCI. Now enter and test this little program from the SCI interpreter: SCI Programmers Manual Copyright (C) 1986, Bob Brodt 50 Pointers words() { char *word, *cp, linebuf[80]; puts("type some words: "); gets(cp = linebuf); while(cp) { cp = parse(cp, &word); puts("word = <"); puts(word); puts(">\n"); } } parse(str,word) char *str; int *word; { while(*str==' ') ++str; *word=str; while(*str!=' ' && *str) ++str; if(*str==0) return 0; *str=0; return str+1; } and from the shell execute the function "words()". What does this program do? 14.3.1 _s_c_a_n_f_(_) The companion to the Library Function "printf()" is "scanf()" (read "scan-eff"). This function performs the reverse operation of "printf()", that is, it converts strings and numbers read from the console keyboard and places them into program variables. This means that its arguments must be _p_o_i_n_t_e_r_s to the appropriate data type. Examine the following program: SCI Programmers Manual Copyright (C) 1986, Bob Brodt Pointers 51 getname() { char firstname[20], lastname[20]; int zip; puts("First name last name and zipcode?"); scanf("%s %s %d", firstname, lastname, &zip ); printf("firstname = %s\n",firstname); printf("lastname = %s\n",lastname); printf("zipcode = %d\n",zip); } Scanf assumes that a string is a stream of consecutive non- blank characters. The function will not stop reading input until all of its conversion (%-letter) codes have been satisfied, or until an end of input (control-Z in MS-DOS) is encountered. This means that carriage returns _d_o _n_o_t cause scanf to quit reading and return to the caller. Carriage returns are simply treated as spaces and tabs, collectively known as "white space". In the program above, you could have entered you first and last name on seperate lines if you like, or on the same line seperated by one or more spaces or tabs. Notice that the _a_d_d_r_e_s_s of the integer variable "zip" was passed to scanf; do you now understand why? If we had used the following statement instead: scanf( "%d", zip ); then the _c_o_n_t_e_n_t_s of "zip" would have been used as the location where scanf would place an integer value. If the contents of "zip" had been zero, then memory locations zero and one would have been altered by scanf, and would have possibly damaged the operating system. 15. IIIInnnnccccrrrreeeemmmmeeeennnntttt aaaannnndddd DDDDeeeeccccrrrreeeemmmmeeeennnntttt OOOOppppeeeerrrraaaattttoooorrrrssss As we have already seen, we can use several methods to access elements in an array. Let's say we had an array and wanted to access its elements sequentially, one after the other. We could either declare a pointer to the array and then increment the pointer by one; or we could declare an integer variable to be used as an array index and increment it each time by one, like so: SCI Programmers Manual Copyright (C) 1986, Bob Brodt 52 Increment and Decrement Operators char a[10], *p; int i; # set all 10 elements in the array "a" to zero, # using a pointer: p=a; i=0; while(i<10) { *p = 0; p=p+1; i=i+1; } # ...and using an index: i=0; while(i<10) { a[i] = 0; i=i+1; } Since these operations come up often in programming, the C language offers a very efficient method of incrementing and decrementing varibles. These are appropriately enough, called the "increment" and "decrement" operators, "++" and "--". The increment/decrement operators are unary operators and can appear either before or after a variable name, like this: int i; ++i; # increment "i" by one i++; # same thing --i; # decrement "i" by one i--; # and again If the operator appears _b_e_f_o_r_e the variable, it is known as a "pre-" increment or decrement; if it appears _a_f_t_e_r the variable, it is a "post-" increment/decrement. The pre- increment/decrement operators perform their function _b_e_f_o_r_e the variable is used in the expression, whereas the post- increment/decrement operators perform their function _a_f_t_e_r the variable has been used in the expression. This is best explained with an example: SCI Programmers Manual Copyright (C) 1986, Bob Brodt Increment and Decrement Operators 53 incdec() { int i; i = 0; putd( ++i ); i = 0; putd( i++ ); } The first instance of "putd()" would print a 1 - the variable "i" was increment by one before its value was passed to "putd()". Now we set "i" to zero again and the second call to "putd()" will print a 0. This is because the post-increment operator passes the value of "i" to the function "putd()" _b_e_f_o_r_e it gets incremented by one. These operators are very handy for quickly scanning through an array like this: prints(s) char *s; { while ( *s ) putchar( *s++ ); } When using the increment/decrement operators on pointers, SCI knows what data type the pointer is referencing and adjusts the pointer so that it points to the next/previous data item. In other words, when incrementing a pointer to an integer, the pointer is incremented by two instead of one so that it points to the next integer. If we wanted to print out all the numbers in an integer array, we might do something like this: dump() { int array[ 10 ], *ap; ap = array; for ( i=0; i<10; ++i ) printf( "%d\n", *ap++ ); } SCI Programmers Manual Copyright (C) 1986, Bob Brodt 54 Increment and Decrement Operators 16. AAAA TTTToooouuuurrrr TTTThhhhrrrroooouuuugggghhhh tttthhhheeee FFFFiiiilllleeee IIII////OOOO FFFFuuuunnnnccccttttiiiioooonnnnssss The formal definition of the C language does not really include any of the Library Functions we have discussed so far. However, most of these have become defacto standards and are considered a part of the language's support library. Although the exact usage of support functions may vary from one compiler implementation to the next, most C compilers adhere to a "standard" to some degree. This section discusses SCI's implementation of the file I/O functions which is fairly compatible with the "standards" proposed by the authors of the C language. This section only attempts to clarify some points concerning the file I/O functions and is not meant as a reference. Please refer to the section in the User's Manual titled "The Library Function" for exact details about these functions. 16.1 ffffooooppppeeeennnn(((()))) Before a file can be used (read from or written to), it must first be "opened" with the Library Function "fopen()". Opening a file ensures that the file exists and is readable/writable and prepares internal data structures for dealing with the file. Before a C program starts up, three "files" are opened for it by the "operating system", these are known as the "standard input", "standard output" and "standard error" file. These usually default to the user's console keyboard and screen. The "standard error" file is an output file and always defaults to the user's console screen. It is usually used by the program to display error messages. The reason we like to have two output files is because we don't want to intermix program error and informational messages with program output data, and to insure that error messages always appear on the user's console. To open a file, "fopen()" must be called with two arguments: the file name, and a character string that defines how the file is to be accessed. For example, int channel; channel = fopen( "SHELL.SCI", "r" ); would open the file "SHELL.SCI" for reading ("r"). MS-DOS allows the file name to be in upper or lower case, other operating systems may not be so indifferent about file SCI Programmers Manual Copyright (C) 1986, Bob Brodt A Tour Through the File I/O Functions 55 names. The second string, known as the "open mode", must contain either a lower case "r" to open the file for reading, "w" for writing or "a" for appending. When a file is opened for reading, it must already exist or "fopen()" will return an error code. If a file is opened for writing, it may or may not exist; if it does exist, it is first deleted before the open. If a file is opened for appending and the file exists, it is opened for writing, but data is written to the end of the file. If the file does not exist, an open for append acts like an open for write. You may also open a file for both reading and writing, meaning you may intermix read and write functions on the same file, but see the section on the Library Functions for more information. The value returned by "fopen()" is an integer known as the "channel number" and points to the previously mentioned internal file control data structure. This channel number is then used by the other file read/write functions to access the file. If "fopen()" was unable to find the file (file opened for reading) or the file could not be created (file opened for writing), it returns a zero, indicating failure. The special channel numbers 1, 2 and 3 may be used to read and write from/to the standard input, output and error files respectively. These channel numbers may be used with the file read/write routines at any time, unless of course you have closed them. 16.2 ffffcccclllloooosssseeee(((()))) When a file is no longer needed, it should be "closed" by the program. Closing a file ensures that the file is safely stored on disk and it frees up the internal file control data structure. This function expects a single argument, the file pointer: fclose( fp ); and returns a zero if the file was closed successfully, or a -1 if an error occured (the file was never opened, disk is write protected, etc.). SCI Programmers Manual Copyright (C) 1986, Bob Brodt 56 A Tour Through the File I/O Functions 16.3 ffffggggeeeettttcccc(((()))) aaaannnndddd ffffppppuuuuttttcccc(((()))) The functions "fgetc()" and "fputc()" are used to read and write respectively a single character from/to a file. Both of these functions advance a "file position pointer" which points to the next character to be read/written to/from the file. This file position pointer is one of the items in the aforementioned internal file control data structure. These functions are used like so: copy(fromfile,tofile) char *fromfile, *tofile; { int fromchannel, tochannel, c; fromchannel = fopen( fromfile, "r" ); tochannel = fopen( tofile, "w" ); while ( (c = fgetc( fromchannel )) != -1 ) fputc( c, tochannel ); fclose( fromchannel ); fclose( tochannel ); } This little program copies the file whose name is the string at "fromfile" to the file whose name is at "tofile". The function "fgetc()" returns the character that was read from the file (a single byte value from 0 to 255) or a minus one if the end of the file was reached or if some other error occured. The function "fputc()" returns the character that was written or a minus one if an error occured. 16.4 ffffggggeeeettttssss(((()))) aaaannnndddd ffffppppuuuuttttssss(((()))) You may also read and write disk files a "line" at a time with the functions "fgets()" and "fputs()". A "line" is a sequence of characters in the file that end with a newline ("\n") character. These are similar to the functions "gets()" and "puts()", which read and write from/to the standard input and standard output. Note that the two functions fputs("hello",2) and puts("hello") are identical. SCI Programmers Manual Copyright (C) 1986, Bob Brodt A Tour Through the File I/O Functions 57 16.5 ffffrrrreeeeaaaadddd(((()))) aaaannnndddd ffffwwwwrrrriiiitttteeee(((()))) Sometimes it is useful to be able to read or write a file in arbitrarily long "blocks". For example, suppose we wanted to store an array of integer numbers in a file. The character read/write functions ("fgetc()" and "fputc()") would work but would be less efficient than writing several characters at a time. The functions "fread()" and "fwrite()" are ideal for these situations: sortfile() { int array[ 100 ]; int channel; channel = fopen( "NUMBERS.DAT", "r" ); fread( array, 200, channel ); fclose( channel ); sort( array, 100 ); channel = fopen( "NUMBERS.DAT", "w" ); fwrite( array, 200, channel ); fclose( channel ); } sort(a,n) int a[], n; { int temp, i, j; for ( i=0; i<n-2; ++i ) { for ( j=i; j<n-1; ++j ) { if ( a[j] > a[j+1] ) { temp=a[j]; a[j]=a[j+1]; a[j+1]=temp; } } } } This program reads an array of 100 numbers from a file, sorts them in numeric order and writes them back to the file. Note that we asked "fread()" and "fwrite()" for 200 bytes. Since the array consists of 100 integers and each integer is 2 bytes, the array is 200 bytes long. SCI Programmers Manual Copyright (C) 1986, Bob Brodt 58 A Tour Through the File I/O Functions Also, be sure to close a file when it is no longer needed. If we had neglected to close the file after the "fopen()" for reading, the second call to "fopen()" would have altered our file pointer. The value of the first file pointer would have been destroyed and the internal file control data structure would have been lost in limbo forever. Although this wouldn't have caused any damage, it is very sloppy programming. Keep in mind that you have only 10 file control data structures available. 16.6 ffffsssseeeeeeeekkkk(((()))) aaaannnndddd fffftttteeeellllllll(((()))) All of the file read/write functions advance an invisible "file position pointer" which determines where in the file the next character will be read from or written to. Sometimes it is necessary to re-read a character or group of characters in a file, or to write over the current contents in a file with new data. The function "fseek()" can be used to relocate the file position pointer to anywhere within the file, and allow you to re-read or re-write data in the file as necessary. Ftell simply returns the current value of the file position pointer. Examine the following sample program: SCI Programmers Manual Copyright (C) 1986, Bob Brodt A Tour Through the File I/O Functions 59 link() { int start, current, inchannel, outchannel, c; inchannel = fopen( "RAW.DAT", "r" ); outchannel = fopen( "LINKED.DAT", "wr" ); start = 0; fwrite( &start, 2, outchannel ); while ( (c=fgetc( inchannel )) != -1 ) { if ( fputc( c, outchannel ) == 0x00ff ) current = ftell( outchannel ); fseek( outchannel, start, 0 ); fwrite( ¤t, 2, outchannel ); fseek( outchannel, current, 0 ); start = current; } } fclose( inchannel ); fclose( outchannel ); } This program copies the file RAW.DAT to LINKED.DAT. Each time a byte of all one's (FF hexadecimal) is encountered in RAW.DAT, the program backs up to the previous start location in LINKED.DAT (indicated in the variable "start") and inserts the file's current file position pointer ("current"). In other words, the program creates a file identical to RAW.DAT except that the data in the file contains information that tells where all of the 0xff's are located within the file - a linked list. Since SCI only supports integer variables, this limits the maximum range of absolute file positioning available with "fseek()" to 32767 from the beginning or end of the file. You can however, position the file pointer to within +/- 32767 bytes from the _c_u_r_r_e_n_t position. This allows you to position the file pointer anywhere within the file, no matter how large the file is. Standard C uses "long" data variables instead of "int"'s for specifying the file position offset. Long's are usually twice the size of an "int" (four bytes instead of two), which gives you a much larger range of absolute file positioning. SCI Programmers Manual Copyright (C) 1986, Bob Brodt 60 The Debugger 17. TTTThhhheeee DDDDeeeebbbbuuuuggggggggeeeerrrr SCI provides a powerful program debugging facility that allows you to execute your programs with complete control. When the debugger is active, it takes control of your program and allows you to step through the program in a controlled fashion. You have the option of either executing a line at a time, or stop at any line in the program. From the debugger you can also examine and change program variables in the middle of a program run, or execute any valid C statement. Besides being a reference for the SCI debug facility, this section will introduce you to general debugging strategies, and show you how you can use the SCI debugger to gain a better knowledge of C program flow. 17.1 IIIInnnnttttrrrroooodddduuuuccccttttiiiioooonnnn The SCI debugger operates in what is known as "symbolic" mode. As you probably already know, a computer can not directly execute program instructions written in the C (or any other higher-level) language. The C language instructions must first be converted to machine language and then executed by the computer. This is the mode of operation when using a C compiler. Alternatively, the C source code can be directly executed by a program known as an "interpreter", which is exactly how SCI works. A C program that has been compiled is completely unreadable by us humans - all resemblence to the original C code has been stripped from the program since it is intended only for the computer's "eyes". We say that the "symbolic" representation of a machine readable program has been removed. On the other hand, since an interpreter always keeps a "symbolic" (human readable) form of your program in memory, it is very easy to follow the program as it is being executed by the interpreter. A debugger that has this ability to let the human reader follow along as the program is executed by the computer, is known as a "symbolic debugger". There are basically two types of program errors that can occur: unrecoverable and recoverable. Unrecoverable errors are typified by the computer's refusal to answer to the programmer's desperate pounding on the keyboard - we say that the computer has "locked up" and gone south for the winter. These errors may be caused by partial or complete destruction of the program itself, or of the operating SCI Programmers Manual Copyright (C) 1986, Bob Brodt The Debugger 61 system and usually require you to turn the computer off and then on again. Recoverable errors are the kind which do not allow the program to run to normal completion and return you to either the operating system or to the calling program, or simply "get stuck" in a never ending loop. The category of recoverable errors also include incorrect results: YOU: what is 2 plus 2? COMPUTER: 5 and unexpected results: COMPUTER: shall I delete this file? YOU: No COMPUTER: OK, file deleted! For obvious reasons, SCI's built-in debugger is only capable of dealing with recoverable errors. The apporach to finding both of these types of errors is basically the same: allow the program to run normally up to the point just before it goes berzerk, then stop and look at how it got there. Usually, the hardest task is finding that point where your program goes over the edge. You have two choices: either run the program from the very beginning, one line at a time until something unexpected happens, or allow the program to run normally and stop just before the section of code that is suspect. The method of executing a program a line at a time is known as "single-stepping". Running a program normally and having it stop at a given line is known as "running to breakpoint". The SCI debugger allows you to use both of these approaches in any combination. 17.2 EEEEnnnnaaaabbbblllliiiinnnngggg tttthhhheeee DDDDeeeebbbbuuuuggggggggeeeerrrr The Library Function "debug" is used to turn the SCI debugger on and off and may be called either from the shell prompt or from within your program. A single argument to the "debug()" function determines the debugger mode. If the argument is zero (debug mode 0), the debugger is completely disabled. If the argument is a one (debug mode 1), the debugger will only grab control of a running program if you hit the <ESCAPE> key from the keyboard, otherwise the program will run normally; if the debug mode is 2 or greater, the debugger is always in control of your program. SCI Programmers Manual Copyright (C) 1986, Bob Brodt 62 The Debugger Thus, your program might look like this: func() { int i; . . . debug(2); # turn debugger ON while(i<10) # scrutinize this loop { . . . } debug(1); # turn debugger OFF again } You can also turn the debugger on directy from the keyboard while a program is running. By pressing the <ESCAPE> key in debug mode 1, the program is stopped in mid execution and the debugger is turned on. Thus, if you have a program that seems to be stuck in a forever loop, you can get control and take a look at what's causing the problem. When the debugger gets control of your program, it will automatically display the line number and program text of the next line to be executed. It is important to realize that the displayed program line has not yet been executed. Directly below the displayed line is a circumflex that points the first item in the line that will be executed. For example, if you had more than one statement on a single line, you might see: 12: i = 10; putd(i); ^ The debugger then displays its "debug>" prompt and waits for you to enter a command. Debugger commands always start with a dot (.) in the first column, followed by a command mnemonic letter. You can also enter a C statement at the debugger's "debug>" prompt and have it evaluated and the results displayed, just like in the shell. We will now walk through a sample program using the debugger as a way of introducing you to the debugger commands. SCI Programmers Manual Copyright (C) 1986, Bob Brodt The Debugger 63 17.3 SSSSaaaammmmpppplllleeee DDDDeeeebbbbuuuugggg SSSSeeeessssssssiiiioooonnnn If you haven't done so already, list the sample program that came with your distribution disk, CALC.SCI, either on your printer or "TYPE" it out on your screen. This is a simple integer calculator program that does addition, subtraction, multiplication and division. At the shell prompt, load CALC.SCI then type "calc()" to start the program. The program displays its prompt (->) and waits for you to enter a command. Try entering some mathematical expressions: shell> load calc.sci shell> calc() -> 2+2 4 -> 2+3*4 14 -> 3*4+2 14 -> 2-30/2 -13 Notice that the program is smart enough to know that multiplication and division have higher precedence than addition and subtraction. To get out of the program and back to the shell, type a carriage return: -> 0 shell> Now, let's try the same scenario but this time turn on the SCI debugger before you start the calculator: > debug(2) 0 > calc() calc() ^ debug> The debugger displays the C statement you entered on the shell's command line, prompts you with its "debug>" prompt and waits for you to enter a command. Now just type a few <RETURN>'s: SCI Programmers Manual Copyright (C) 1986, Bob Brodt 64 The Debugger debug> 7:calc() ^ debug> 8:{ ^ debug> 12: Stacktop = 10; ^ debug> 13: for(;;) ^ debug> The program is being executed one line at a time each time a <RETURN> is hit. Notice that each line is displayed preceded by the line number of the program. The statement you entered from the shell that started up the calculator is not a part of the program. This is why it was not preceeded with a line number. 17.3.1 _E_x_i_t_i_n_g__t_h_e__D_e_b_u_g_g_e_r To halt the program and return back to the shell, use the debugger's "quit" command: debug> .q 0 shell> Now we're back to the shell's prompt. The ".q" command stopped the program and turned the debugger off. Since we want to experiment some more, turn the debugger back on again and start up the program: shell> debug(2) 0 shell> calc() calc() ^ debug> 17.3.2 _S_i_n_g_l_e__S_t_e_p_p_i_n_g You can execute more than one program line at a time with the "step" command. At the "debug>" prompt, type ".s" followed by the number of program lines you want to execute: SCI Programmers Manual Copyright (C) 1986, Bob Brodt The Debugger 65 debug> .s 4 13: for(;;) ^ debug> The debugger executed 4 lines, then stopped. This is useful when you want to get through a section of code quickly without having to hit <RETURN> and waiting for each line to be displayed. The "continue" command is equivalent to a ".s" with an infinitely large step count: debug> .c -> 2+2 4 -> The program appears to be running slower than when it was run with the debugger turned off. This is because the debugger is still in control of the program, and must examine each line before it is executed. The usefulness of the ".c" command will become apparent later when we discuss breakpoints. To return back to the debugger prompt, hit an <ESCAPE> while the program is running. If the program is waiting for input from the console, hitting the <ESCAPE> key will have no effect. So type some mathematical expression as before, hit a <RETURN> and then quickly hit the <ESCAPE> key: -> 2+3*4 interrupt 66: for(;;) ^ debug> When the <ESCAPE> is hit, the debugger displays an "interrupt" message, followed by the program line it was currently working on. 17.3.3 _D_i_s_p_l_a_y_i_n_g__G_l_o_b_a_l__V_a_r_i_a_b_l_e_s At any time the debugger is waiting for input you may display all of the program's global variables and their contents with the "global" command: SCI Programmers Manual Copyright (C) 1986, Bob Brodt 66 The Debugger debug> .g char *Lineptr @9756: = "+3*4" int Stack[10] = 2 0 0 0 0 0 0 0 0 0 int Stackptr = 1 int Stacktop = 10 debug> The ".g" command displays each variable along with its data type ("char" or "int"). If the variable is an array or a pointer, its address is also printed in decimal, for example like so: @9756. Following that, the variable's value is displayed. If the variable is an array or a pointer, the first ten items in the array are displayed. Character arrays are printed as strings, and integer arrays as a series of decimal numbers. Another form of the ".g" command, ".G", will display all of the program's functions and their program line numbers in addition to global variables: debug> .G char *Lineptr @9756: = "+3*4" int Stack[10] = 2 0 0 0 0 0 0 0 0 0 int Stackptr = 1 int Stacktop = 10 7:calc() 26:number() 36:addition() 61:multiplication() 86:push( n ) 93:pop() 100:isdigit( c ) debug> 17.3.4 _B_r_e_a_k_p_o_i_n_t_s Next we will discuss one of the most powerful features of the debugger: breakpoints. Let's say we wanted to stop the program every time the functions "push" and "pop" were called, so that we could inspect the program's state. Looking at the debugger's output from the ".G" command above, we see that these functions are located at lines 82 and 89 respectively. To set breakpoints at these line numbers, we would enter the following two commands: SCI Programmers Manual Copyright (C) 1986, Bob Brodt The Debugger 67 debug> .b 86 breakpoint set: 86:push( n ) debug> .b 93 breakpoint set: 93:pop() debug> The debugger prints the program line at which the breakpoint is set for verification. You may set a maximum of 5 breakpoints at any one time. To display all of the breakpoints that are currently set, use the ".B" command: debug> .B 86:push( n ) 93:pop() debug> Now we can continue executing the program normally and it should stop as soon as either the "push" or "pop" functions are called. This is where the ".c" command is used: debug> .c breakpoint: 86:push( n ) ^ debug> As soon as a breakpoint is reached, the debugger announces this fact and displays the program line at the breakpoint. To delete a breakpoint, use the "delete breakpoint" command: debug> .d 86 breakpoint deleted: 86:push( n ) debug> Again, the program line at which the breakpoint was deleted is displayed for verification. When a breakpoint is deleted, the debugger will no longer stop at this program line after a ".c" command. To delete all breakpoints set, use the ".D" command: SCI Programmers Manual Copyright (C) 1986, Bob Brodt 68 The Debugger debug> .D all breakpoints deleted debug> 17.3.5 _F_u_n_c_t_i_o_n__C_a_l_l__T_r_a_c_e__B_a_c_k Using breakpoints we can be certain of only one fact: the program started at point "A", and stopped at point "B". We know nothing about the route it took in getting there. The debugger's "trace back" command at least tells us the order of function calls that got us to point "B": Continuing with our debuging session, type the following command: debug> .t 26:number() 61:multiplication() 36:addition() 7:calc() debug> The function call trace back printed by the ".t" command is read backwards from bottom to top. In other words in the above display, the function "calc" (which is the starting point) called "addition", which in turn called "multiplication", and so on. A variation of the "trace back" command, ".T", will also display all local variables and their contents for each function in the trace back: debug> .T 26:number() 61:multiplication() int num = 0 36:addition() int num = 0 7:calc() char line[80] = "2+3*4" debug> The local variables are displayed in a similar format as for the ".g" command. 17.3.6 _E_x_a_m_i_n_e__a__P_r_o_g_r_a_m You may also use the SCI editor to examine your program. The editor will not allow you make any changes when invoked from the debugger, since this could completely alter the state of the current program run. To "examine" your program, type: SCI Programmers Manual Copyright (C) 1986, Bob Brodt The Debugger 69 debug> .e The screen is erased and the editor is started up with the cursor resting on the line in the program that is to be executed next. You may move about freely in the editor, but you may not make any changes. When you exit the editor (with a ^Z) the debugger knows which line the cursor was on when you left the editor, and you may set a breakpoint at that line by just giving the ".b" command _w_i_t_h_o_u_t a line number. SCI Programmers Manual Copyright (C) 1986, Bob Brodt 70 The Shell 18. TTTThhhheeee SSSShhhheeeellllllll This section will discuss in detail the operation of the program found in the file SHELL.SCI. We will also show you how to customize the shell program to suit your needs. If you haven't done so already, print out the shell program file or "TYPE" it out on your console screen. As you can see there are basically 2 sections of this program: the first section declares all of the Library Functions ("sys" call interfaces), The second section starts immediately after the "entry" keyword with the function "main()". This is the function that is executed after SHELL.SCI has been loaded into memory. Let's examine this function more closely now: . . . 46:entry 47:main() 48:{ 49: int f, t; 50: char buf[24]; 51: char line[81]; 52: char program[ memleft()-1024 ]; 53: 54: puts(sys(0)); 55: puts("\nSCI Shell V1.5 20Oct86 Copyright (C) 1986 Bob Brodt\n"); 56: *program='Z'; 57: _mhz=12; 58: 59: _nr=25; _nc=80; 60: _ro=_co=1; 61: _cp="\033[%d;%dH"; 62: _el="\033[K"; 63: 64: for(;;) { 65: puts("shell> "); 66: line[5]=0; 67: if(gets(line)) { 68: if (!strncmp(line,"edit",4)) 69: sys(atoi(line+4),program,19); 70: else if (!strncmp(line,"list",4)) { 71: f=1; 72: t=32765; 73: if(line[4]) 74: sscanf(line+4,"%d %d",&f,&t); SCI Programmers Manual Copyright (C) 1986, Bob Brodt The Shell 71 75: sys(program,f,t,27); 76: } 77: else if (!strncmp(line,"save",4)) 78: sys(line+5,program,26); 79: else if (!strncmp(line,"load",4)) 80: sys(line+5,program,25); 81: else if (!strncmp(line,"exit",4)) 82: return; 83: else if (!strncmp(line,"dir",3)) { 84: if ( !line[3] ) 85: strcpy(line+4,"*.*"); 86: if ( dirscan(line+4,buf) ) { 87: printf("%s\n",buf); 88: while(dirscan(0,buf)) 89: printf("%s\n",buf); 90: } 91: } 92: else 93: printf("\n%d\n",sys(line,program,16)); 94: } 95: } 96:} Note that we have included line numbers here for reference. The data declarations at lines 51 and 52 are the shell's input line buffer (line[]) and user program buffer (program[]) respectively. Lines 54 and 55 of course print the program identification banners. Lines 57 through 62 assign the editor's customization variables (for the IBM PC in this version). Line 64 starts a "forever" loop that can only terminated by the "return" at line 82. This loop starts out by displaying the shell prompt ("shell> ") on the console screen, then waiting for a line of input from the console keyboard. The input line buffer stores the C statement read in from the console at line 67. It is then compared to each of the strings "edit", "list", "save", "load", "exit" and "dir". If the first four characters in the input buffer don't match any of these strings, the line is assumed to be a C statement and handed off to the interpreter (via "sys" function 16) for execution at line 93. The program buffer, "program" is used to store the user's program functions and variables. The user can enter data into this buffer only by way of the SCI program editor ("sys" function 19). In fact, the program buffer is completely hidden from the user - any attempt to reference SCI Programmers Manual Copyright (C) 1986, Bob Brodt 72 The Shell it via a C statement (for example "putchar( program[0] )") will result in an "undefined symbol" error message. The user's program in the "program" buffer is in "tokenized" form. That is, each language element (variables, keywords, punctuation, etc.) has been encoded so that it can be more easily and quickly recognized by the interpreter. The tokenized form of a program bears almost no resemblance to the human-readable form and should not be tampered with. 18.1 CCCCuuuussssttttoooommmmiiiizzzziiiinnnngggg tttthhhheeee SSSShhhheeeellllllll Now we will show you how you can customize this program. Start up SCI and when the shell's prompt appears, enter the command: shell> load shell.sci to load the shell file. Now edit the program from SCI's editor and remove all lines from the beginning of the program up to and including the "entry" keyword. Next let's change the string on line 65 (above) to something like: "yes, dear? ". Exit the editor and from the shell prompt type: shell> main() You should see the program identification banner again and the new shell's prompt, "yes, dear? "! You can now do everything from this new shell that you did from the original shell - write programs with the editor, save them, list them and load them. When you type "exit" to this new shell however, you are returned to the original shell. Type an "exit" now to get back to the first shell and from there do a "save newshell". Then type "main()" again to get the "yes, dear? " shell. From here, type "load newshell" to load the newshell program. Edit the newshell program and change the "yes, dear? " prompt to something like: "you again? ". Now exit the editor and at the "yes, dear? " prompt type "main()". You should again see the program logon banner and the new shell prompt "you again? ", like this: SCI Programmers Manual Copyright (C) 1986, Bob Brodt The Shell 73 yes, dear? main() Small C Interpreter V1.5 20Oct86 Copyright (C) 1986 Bob Brodt SCI Shell V1.5 20Oct86 Copyright (C) 1986 Bob Brodt you again? In all, we now have 3 different shell programs running, one on top of the other, and we could actually continue doing this until we run out of memory! This is exactly analagous to the layers of an onion: each layer gets smaller as you go towards the center of the onion, just as the amount of usable memory becomes less as each new shell program is loaded from the previous shell. Now return to the original shell like so: you again? exit 0 yes, dear? exit 0 shell> It now becomes an easy task to customize the shell program to your heart's content using the SCI editor, test it from the SCI interpreter environment and when it's fully debuged, save it to disk. Of course you must remember to insert the Library Function declarations and the "entry" keyword before the new shell "main()" function if you intend to replace SHELL.SCI with the new program. 18.2 DDDDOOOOSSSS CCCCoooommmmmmmmaaaannnndddd LLLLiiiinnnneeee AAAArrrrgggguuuummmmeeeennnnttttssss ttttoooo tttthhhheeee SSSShhhheeeellllllll A mechanism has been provided to pass operating system command line arguments to the shell in a way similar to most commercially available C compilers. By specifying a "-A" on the MS-DOS command lines, all arguments to the right of the "-A" will be ignored by SCI and instead passed to the shell program. SCI always passes two arguments to the "entry" program in the startup file, although the program is free to use or ignore these arguments. These are: a count of the number of arguments following the "-A" option on the DOS command line and; A pointer to the array of strings that contain these arguments. These arguments are commonly declared as "argc" (argument count) and "argv" (argument vector) in the C community. SCI Programmers Manual Copyright (C) 1986, Bob Brodt 74 The Shell Make the following changes and additions to the program in SHELL.SCI: . . . main(argc, arv) int argc; char **argv; { int i; while ( i<argc ) { puts(argv[i++]); putchar('0); } . . . } Then, when the following command is entered at the operating system level: A>SCI -A Hello out there! the shell program would start up like this: Hello out there! Small C Interpreter V1.5 20OCt86 Copyright (C) 1986 Bob Brodt Shell V1.5 20OCt86 Copyright (C) 1986 Bob Brodt > SCI Programmers Manual Copyright (C) 1986, Bob Brodt CONTENTS 1. Introduction to SCI Programming.................... 1 2. SCI Statement Structure............................ 2 3. SCI Program Structure.............................. 2 4. Functions.......................................... 6 4.1 Library Functions............................ 7 5. Your First Program................................. 8 5.1 Hello again, world!.......................... 8 5.2 Fahrenheit to Celsius........................ 9 6. Statements: Simple and Compound.................... 9 6.1 Comment Statements........................... 11 7. Expressions........................................ 11 7.1 Operators.................................... 12 7.2 Precedence................................... 12 7.3 Associativity................................ 13 7.4 Arithmetic operators......................... 13 7.5 Bitwise Operators............................ 13 8. Variables.......................................... 14 8.1 Naming Conventions........................... 14 8.2 Data Types................................... 15 8.3 Scope........................................ 15 8.4 Location of Variables........................ 19 9. Constants.......................................... 20 9.1 Hexadecimal Constants........................ 21 9.2 Octal Constants.............................. 21 9.3 ASCII Character Constants.................... 21 9.4 String Constants............................. 22 10. Assignment Operator................................ 23 10.1 Lvalues and Rvalues.......................... 24 11. Comma Operator..................................... 24 12. Flow Control....................................... 25 12.1 if and if-else............................... 25 12.2 while........................................ 34 12.3 for.......................................... 36 12.4 switch....................................... 38 - i - 13. Arrays............................................. 40 14. Pointers........................................... 42 14.1 Lvalues and Rvalues Revisited................ 43 14.2 Pointer Operator............................. 44 14.3 Address Operator............................. 48 15. Increment and Decrement Operators.................. 51 16. A Tour Through the File I/O Functions.............. 54 16.1 fopen()...................................... 54 16.2 fclose()..................................... 55 16.3 fgetc() and fputc().......................... 56 16.4 fgets() and fputs().......................... 56 16.5 fread() and fwrite()......................... 57 16.6 fseek() and ftell().......................... 58 17. The Debugger....................................... 60 17.1 Introduction................................. 60 17.2 Enabling the Debugger........................ 61 17.3 Sample Debug Session......................... 63 18. The Shell.......................................... 70 18.1 Customizing the Shell........................ 72 18.2 DOS Command Line Arguments to the Shell...... 73 - ii -